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Many complex systems can be represented as networks and separating a network into communities 
could simplify the functional analysis considerably. Recently, many approaches have been proposed 
for finding communities, but none of them can evaluate the communities found are significant or 
trivial definitely. In this paper, we propose an index to evaluate the significance of communities in 
networks. The index is based on comparing the similarity between the original community structure 
in network and the community structure of the network after perturbed, and is defined by integrating 
all the similarities. Many artificial networks and real-world networks are tested. The results show 
that the index is independent from the size of network and the number of communities. Moreover, we 
find the clear communities always exist in social networks, but don't find significative communities 
in proteins interaction networks and metabolic networks. 
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The study of the community structure of networks has 
become a very important part of researches of complex 
networks. Nodes belonging to a tight-knit community 
are more likely to have particular properties in com- 
mon. In social relationship network, communities usually 
represent different friend subgroups. In the world wide 
web, community analysis has uncovered thematic clus- 
ters. In biochemical or neural networks, different com- 
munities may represent different functional groups, and 
separating the network into such groups could simplify 
the functional analysis considerably. As a result, the 
problem of identification of communities has been the 
focus of many recent efforts. So two questions are pro- 
posed, the first is, how to detected communities in the 
networks? In recent studies, plenty of algorithms are pro- 
posed 0,0J,SS[i,0JJ,[3,[Iil,[il,[il[il[il (see @ 

as a review) . The second question is coming hand in hand 
with the first question: how to evaluate the communities 
detected? We believe that there exist clear communi- 
ties in some networks while no clear communities in the 
other networks. But almost all algorithms could find the 
"community structure" in networks in their ways, with- 
out thinking about whether the community structure ac- 
tually exists or not. Even many algorithms can also find 
the community in random networks, in which are consid- 
ered having no community. For the existence of such a 
situation, the discussion on the "significative communi- 
ties" is needed. As a network is given, it is meaningless 
to detect the community when the community structure 
is not significative at all. 

Scientists try to propose a universal index to evaluate 
the partitions. And the modularity Q (lfl | was presented 
as an index of community structure and by now it has 
been widely accepted d, O, EH EH as a measure for the 
community structure. Modularity Q was presented as 
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a index of community structure by Newman and Grive, 
which was introduced as Q = ^2 r (e rr — a^), where e rr 
are the fraction of links that connect two nodes inside 
the community r, a r the fraction of links that have on or 
both vertices in side the community r, and sum extends 
to all communities r in a given network. The larger the 
value of Q is, the more clearly a partition into communi- 
ties is. Hence, the value of the modularity can be used as 
a significative index for communities. Unfortunately, de- 
spite the obvious advantages of modularity, it has its own 
problem. It is true that networks with strong commu- 
nity structure have high modularity but not all networks 
with high modularity have strong community structure 
Here, we just say Q value is not a very good in- 
dex to evaluate the significance of community structure, 
but do no mean that maximizing modularity Q cannot 
detect community structure. Many empirical and numer- 
ical results represent maximizing modularity Q is a good 
method for detecting communities [n| E3] • Therefore 
in the following analysis, we still use maximizing modu- 
larity Q to detect community structure. 

Recently, Karrer, Levina, and Newman have suggested 
a method to perturb the networks. They have shown 
some phenomena about the robustness of community 
structure in networks p7| . Intuitively, if a network has 
distinct communities, the community structure should be 
robust under perturbation. Thus in this paper, we de- 
velop a perturbation method and propose an index to 
measure the significance of communities based on the 
perturbation to the network, and try to solve the second 
question mentioned above. In our method, we strengthen 
the perturbation to the network from just small amount 
of edges rewired to all edges rewired. Then we can get 
the results of perturbations (the similarity of community 
structure between original and perturbed networks) for 
each case. Finally we get our index by integrating all the 
similarities of perturbations. Using our index, we can 
evaluate whether the network has a "significative com- 
munities" . The method is described in detail in the fol- 
lowing section. Naturally, we apply the method to many 
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kinds of networks, and find some interesting conclusions. 
We argue that social networks usually have distinct and 
significative community structure, while metabolic net- 
works also have community structure but not so clear. 
However, some protein interaction networks we tested 
have no significant communities. 



I. METHOD 

There are three steps to get our index for a given net- 
work. First, we detect the communities in the original 
network without any perturbation. Second, we will per- 
turb the network, using the way of perturbing the edges 
in network by an arbitrary amount. Then we can de- 
tect the new communities after perturbation. Besides, 
we calculate the similarity between the two partitions 
(the communities of original network and perturbed net- 
work). Third, we increase the proportion of edges per- 
turbed little by little until all edges are perturbed, repeat 
the process of second step, and compare the new commu- 
nities with original ones with perturbation strengthened. 
Hence, we can get a series of proportion of perturbation 
as well as the corresponding similarity values. At last, 
we sum up all the products of similarity values and the 
corresponding increased proportion of perturbation. If 
we just increase the proportion little enough, the process 
just like the calculation of integration. 

When we perturb the edges in the network, there are 
various methods to achieve. In this paper, we adopt ab- 
solutely random perturbation to the network. Consult- 
ing to the method of network perturbation introduced by 
Newman [l7j , we makes sure the total number of edges is 
unchanged, which make the comparison of the partitions 
straightforward. Specifically, we go through each edge 
in original network and with probability p we remove it, 
then we add the same amount of edges randomly between 
any two nodes, which have no connection after pertur- 
bation. In this way, if p=0, no edge is moved and the 
network is all the same with original. If p=l, all edges 
are moved and the process generates a random graph, 
which has no correlation with original. And for values of 
p between and 1 the perturbation generates networks 
in which some of the edges retain their original positions 
while the others are moved to new positions. Therefore, 
we adopt a sequence perturbation to the network. We do 
not only perturb networks by little, but also strengthen 
the proportion of perturbation until all the edges move 
their positions. Further, we do not care if the expected 
average degrees of every node is the same as before, which 
is different from Karrer and Newman et al[17]. We argue 
that the absolutely random perturbation to the network 
is more reasonable, simple and efficient. 

After detecting the community structure in the net- 
works perturbed, the question becomes how to compare 
the similarity between the communities perturbed and 
the original. We think that a more discriminatory mea- 
sure is the normalized mutual information index, which 



is based on information theory, as described in Ref [19j |. 
They defines a confusion matrix N, where the rows cor- 
respond to the "real" communities in networks without 
perturbation, and the columns correspond to the "found" 
communities. The element of N, iVy is the number of 
nodes in the real community i that appear in the found 
community j. Therefore a measure of similarity between 
the partitions A and B is 
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I(A,B) 



(1) 



As the discrepancy of partitions increases, the value of 
I (A, B) decreases from 1. In this paper, we compare the 
"communities without perturbation" A and the "com- 
munities after perturbation" A(p), what is different, we 
make a little change on the similarity index. We found 
that the I (A, A{p)) has been not only decided by the dis- 
crepancy of the communities, but also influenced by the 
size of networks and the number of community in A and 
A(p). In order to eliminate the influence of the size, we 
consider the improved measure below: 

S(A, A(p)) = I(A, A(p)) - I(A rand , A rand {p)) (2) 

where, A ran d or A ran d{p) has same number of communi- 
ties with A or A(p), moreover each community in A ranc i 
or A ran( i (p) has the same number of nodes with the corre- 
sponding community in A or A{p) respectively. But dif- 
ferent from A, A(p) that are correlated with the original 
network, the nodes in A ranc i and A ranc i(p) are randomly 
selected form the whole set of nodes. In this way, we can 
get a scries of values of S(A, A(p)) by strengthening the 
proportion of perturbations from to 1 little by little. 
We adopt 0.02 as the increased proportion of perturba- 
tion for each time in this paper. Generally, a higher pro- 
portion of perturbation corresponding to a lower value of 
S(A, A{p)). Hence,we can get our measure as following: 
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S(p)dp 



(3) 



where p is the proportion of perturbation, and S(p) is 
the similarity value between original community struc- 
true and the community sturcture when the proportion 
of perturbation is p. If a network has distinct community 
structure, the value of our measure R is inclined to high. 
On the contrary, the network holding fuzzy community 
structure displays low value. For a random network R 
will approach to theoretically. The value of the simi- 
larity is a function of the parameter p that measures the 
amount of perturbation. The similarity value starts at 1 
when p = 0, as we would expect for an unperturbed net- 
work. Then the similarity value drops off and approaches 
its minimum value while p = 1, while the network at 
present is an absolute random network. 

Dose the measurement is independent with the size of 
network, and what will happen when changeing the num- 
ber of communities with same size and number of edges? 
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Moreover, can the measurement work well in some net- 
works that Q index fails to measure [Hj]? In order to give 
answers to the above questions, firstly we apply the mea- 
surement R in same size networks with same number of 
communities, and each community with same number of 
nodes. There are no edge between different sub-networks 
and each of them is ER network. That means the commu- 
nities are distinct. Numerical experiments present that, 
the value of index R is roughly independent with the 
size of network and number of communities. When the 
average degree increases, the value of R will increase cor- 
respondingly (as shown in Fig. [1]) . Secondly, we compare 
Q index and R index in ER networks. It is known that 
Q index cannot measure ER and BA networks [HI]. For 
the BA and ER networks with lower average degree, the 
modularity Q could be very high. So we compare R in- 
dex with Q index in different BA and ER networks with 
different size and average degree. The results tell us that 
R index has the same behaviors in BA and ER networks. 
When the average degree is large or equal to 2, R index 
will be lower than 0.1, and soon be stable. When the 
average degree is 1, the R is less than 0.2. From the 
following applications on artificial networks (as shown in 
Fig. [3]), we known that R < 0.1 is a low value. It indi- 
cates there are no community structure in the network. 
But R = 0.2 is not very low. It presents there exists fuzzy 
communities in the network. Hence, our index performs 
well but it is also not suitable for some networks where 
average degree is less than 1. Fortunately, there are few 
real- world networks with average degree less than 1. By 
and large, our index is more efficient than Q index in BA 
and ER networks (as shown in Fig. [2]). Moreover, from 
the numerical experiments we find that for a very large 
size network which contains two equal clique-complete 
community structure network, the value R can be larger 
than 0.9, and the value of R can lower than 0.03 for large 
size random networks with proper average degree. Thus, 
we can conclude that R e (0, 1) roughly. 



II. RESULT 

In order to test the validity of our index. Firstly, we 
apply it on computer-generated random networks with 
a well-known predetermined community structure. Each 
network has n = 128 nodes divided into 4 communities 
of 32 nodes each. Edges between two nodes are intro- 
duced with different probabilities depending on whether 
the two nodes belong to the same community or not: ev- 
ery node has (ki n ) links on average to its fellows in the 
same community, and {k out ) links to the other communi- 
ties, keeping (ki n ) + (k out ) = 16. As is known to all, the 
communities become more and more diffuse and harder 
to identify when k out increase, hence the significance of 
the communities found by algorithm also tends to weak- 
ness and R index will decrease. In order to validate the 
expectation that R index will become lower as the k ou t 
decreases, we calculate the value of R in the case that 
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FIG. 1: The relationship among the value of R index, network 
size average degree and community number. In the plot l xn 
yc' denotes x nodes and y pre-determined communities with 
same size. Every pre-determined communities (sub-networks) 
are generated by the same way. They are ER networks and 
disconnected with each other. From the plot we can see that 
the value of R increase with the increasing of average degree 
and is almost independent from the size of network and num- 
ber of communities. 
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FIG. 2: Comparing modularity Q and index R in BA and ER 
networks in which there exists no community structure. From 
the plot we can see that, Q is very large when the average de- 
gree is about 1, while, value of R is near 0.2. That is to say, 
when average degree is near 1, Q index presents very strong 
community structure, and R shows fuzzy community struc- 
ture (we obtain that there exits fuzzy community structure 
when R = 0.2 form the numerical results in artificial net- 
works (see Fig. [3jl). But when the average degree increases, 
Q drops more slowly than R. When the average degree is 
larger or equal to 2, R is very low and achieves stable state 
soon, which indicates R index perform well in both BA and 
ER networks. 
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FIG. 3: The x-axis is k out (the proportion of connections 
between communities) , while the j/-axis represents the value of 
measurement as described in this paper. The percentage we 
increase the proportion of perturbation is 0.02 for each time. 
Each value corresponding to the kout is the average value of 
20 numerical experiments where each time we generate a new 
independent network. The value of R is 0.58 when k out is 
but R is 0.05 when k out is 11. When k out is about 11, the 
network is random in which there is no community structure 
theoretically. 



k ou t ranges from to 12. The method we use to detect 
community structure in this paper is the combination of 
Newman's spectral algorithm and extremal optimization 
algorithm [130. We use spectral algorithm to detect 
the initial community structure, and extremal optimiza- 
tion algorithm to improve the community partition. We 
also use an other algorithms [20| to detect the community 
structure, we find the influence of different algorithms on 
our index is neglectable. 

The result is shown in Fig[3] As our anticipation, the 
value of index varies from 0.58 to 0.05 as k out varies from 
to 12, which means that our index have good abil- 
ity to mark the significative of communities in compute- 
generated network. For larger values of k ou t, the value 
of the index is lower, indicating that the community 
structure is not more significative than that of a ran- 
dom graph. The index decreases as a function of k ou t, 
indicating that the community structure discovered by 
the algorithm is relatively significative when k out is rela- 
tively low (or ki n is relatively high). 

Of course, we also apply it on many real networks 
[HI [H [H Ej, HI, [H, M, EE EH . A good index shouldn't 
be available to computer-generated networks only, but 
also has good behavior in real networks. It is necessary 
to proof- test our index on all kinds of real networks. Peo- 
ple usually classify the real networks into three sorts: so- 
cial networks (such as scientist collaborations and friend- 
ships), biological networks (such as proteins interaction 
networks and metabolic networks) and technological net- 



works (such as Internet and the WWW). Distinct com- 
munities within networks have been observed in different 
kinds of networks, most notably in social networks while 
fuzzy in biological networks often. We apply the index 
into many different networks, and obtain relatively high 
value of our index in social networks. Therefore, we val- 
idate the availability of our index. You can get more 
detail form FigU] and TabUl FigQ] shows the curves of 
S(p), using 4 networks as an example. Here we aver- 
age the results of the 20 times simulation in the figure, 
in which we earmark the maximum, minimum, and the 
mean value of the 20 times simulation. As is shown, 
the similarity measure of Jazz network decrease slowly 
while the similarity of the other three networks decrease 
rapidly. The figure argues that the communities in Jazz 
network are more robust than other three. It means that 
the structure of the Jazz network is hardly changed un- 
der perturbation. Thus the community structure in Jazz 
network is distinct and significative. Tab|T] shows all the 
networks we apply the index on. From the table, we 
find different kinds of networks have different index val- 
ues, which indicate the significance of the communities 
in different networks varies. First, we analyze several 
social networks, including Zachary karate club network, 
dolphin network, collage football network, Jazz network, 
scientists collaboration network and so on. We get rela- 
tively high value of our index among these networks, and 
most of these networks have the index value over 0.27, 
which shows the existence of strong community structure 
in these networks, and the community structure found in 
these networks are clear. However, the Santa Fe scien- 
tists collaboration network has an index value 0.14, which 
is low. As is known, the Santa Fe Institute is different 
from many other Institutes. Renowned scientists and 
researchers that come to Santa Fe Institute are from uni- 
versities, government agencies, research institutes, and 
private industry. Therefore the relationship between the 
members is not as tight as other collaboration networks. 
All the social networks in Tab[T]are networks of friendship 
(collaboration could be viewed as a form of friendship). 
Just as said "Birds of a feather flock together" , it is easy 
to understand why the social networks always centralize 
as some distinct groups. 

What's more, we also analyze some biological networks 
such as proteins interaction networks E.coli, Yeast and 
H. Sapiens, and many metabolic networks. We find in 
proteins interaction networks the R index value are low 
(the average degree of H. Sapiens is less than 2, so its R 
index is little high). In metabolic networks, we calculate 
43 metabolic networks, all the index value R are medium 
about 0.19. For the average nodes' number 1488 and av- 
erage edges' number 3460, the average index value is 0.19. 
Therefore we conclude that in some proteins interaction 
networks (such as E.coli and Yeast) and the metabolic 
networks which are listed in the following table, there 
are no clear communities. It may be unnecessary to de- 
tect and analyze the communities in these networks. 
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FIG. 4: The a;-axis represent the average number of pertur- 
bation, while the y-axis is the similarity value S(p). For each 
network, we earmark three value here: maximum, minimum 
and mean of 20 times simulation. The network in turn are 
proteins interaction network (E. coli), neural network (C. el- 
egans), social network (Jazz) and metabolic network (Heli- 
cobacter Pylori) from the bottom up. The increase of he 
proportion of perturbation every time is 0.02 



III. CONCLUSION AND DISCUSSION 



In this paper an index is presented which can mea- 
sure the significance of communities detected. The index 
is based on comparing the similarity between the orig- 
inal community structure and the community structure 
after perturbed in the network. Then the index value 
is the integration of all the similarities. We apply the 
index to many artificial and real world networks, such 
as social networks, neural network, proteins interactions 
networks and metabolic networks. The results show that 
our index is independent form the network size and com- 
munity number. Moreover we find the different kinds of 
networks have different characteristics, social networks 
usually have significant communities, while communities 
are comparatively fuzzy in biological networks, especially 
in some protein-interaction networks. 



TABLE I: The integral measure of some real networks. The 
table shows the names of different real networks and the cor- 
responding index values. The column of size denotes the num- 
ber of nodes and edges 



network 


size 


R 


type 


E.coli 


1442, 5873 


0.14 


protein 


Yeast 


1870, 4480 


0.14 




H. Sapiens 


693, 982 


0.21 




Celegans metabolic 


453, 4596 


0.19 


metabolic 


Aquifex aeolicus 


1485, 3400 


0.19 




Helicobacter pylori 


1363, 3151 


0.19 




Yersinia pestis 


1950, 4505 


0.18 




43 metabolic networks 


1488, 3460 


0.19 




Celegans neural 


297, 2359 


0.24 


neural 


Santa Fe scientists 


260, 2692 


0.14 


social 


Zachary karate 


34, 78 


0.27 




Dolphin 


62, 159 


0.27 




College football 


115, 613 


0.38 




Jazz 


198, 5484 


0.42 




Political blogs 


1224, 19090 


0.29 




Political books 


105, 441 


0.34 
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